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Abstract. Visual analytics is a subdomain of data analysis which com¬ 
bines both human and machine analytical abilities and is applied mostly 
in decision-making and data mining tasks. Triclustering, based on Formal 
Concept Analysis (FCA), was developed to detect groups of objects with 
similar properties under similar conditions. It is used in Social Network 
Analysis (SNA) and is a basis for certain types of recommender systems. 
The problem of triclustering algorithms is that they do not always pro¬ 
duce meaningful clusters. This article describes a specihc triclustering 
algorithm and a prototype of a visual analytics platform for working 
with obtained clusters. This tool Is designed as a testing frameworkis 
and is intended to help an analyst to grasp the results of triclustering 
and recommender algorithms, and to make decisions on meaningfulness 
of certain triclusters and recommendations. 

Keywords: visual analytics, formal concept analysis, triclustering, social 
network analysis. 


1 Introduction 

Classical Formal Concept Analysis (FCA) deals with data which describe a 
relationship between a set of objects and a set of attributes and provides methods 
to derive a concept hierarchy or formal ontology in them [T] . FCA is a powerful 
tool for revealing dependencies in data and is commonly applied to data mining 
(in particular, text mining), machine learning, knowledge management, semantic 
webs, software development, and biology. 

As a natural extension of FCA, Triadic Concept Analysis (TCA) manages 
triadic data in a form of objects, their attributes, and conditions under which 
these objects have certain attribntes [5]. A common example is a social network 
analysis with a context including nsers (objects), events they take part in (at¬ 
tribntes) and interests (which might be regarded as conditions under which a 
user participates in a certain event). 

As the task of finding all concepts or triconcepts is compntationally chal¬ 
lenging, certain relaxations of these terms have been introdnced: biclusters 
and triclnsters |1]. Here we address triclnsters, i.e. combinations of sets of ob¬ 
jects, their attributes, and conditions where not every object must have each 
attribute. Triclustering provides an output in the form of object clnsters with 
similar attributes under similar conditions. Therefore, it is applied to mining 


users with common interests, applicants with similar competences or books la¬ 
belled by close tags 0 . 0 - Triclustering is also a basis for a certain type of 
recommender systems 0 , 0 - 

Visual analytics is an increasingly popular branch of Computer Science which 
combines both human and computer qualities to solve a range of problems that 
might lay beyond the power of man or machine separately. Actually, it is a sub- 
domain of data analysis focusing on decision-making through data preprocessing, 
data mining and interactive user interfaces. For instance, Siemens PLM software 
allows developers to collect, process, visualize report data in the 3D engineering 
environment, and make real-time decisions in the process of developing new ve¬ 
hicles. The same method is used in situational and decision-making centres, in 
nuclear power energetics, and in crime investigations. 

In this paper, we explore these topics and describe a framework which uses 
visual analytics to solve some problems in FCA. 

2 Visual analytics 

2.1 Definition and specificity 

Generalizing and selecting crucial aspects of various definitions of visual an¬ 
alytics 0 , m, here we propose the following one: 

Visual analytics is a subdomain of data analysis focusing on analytical rea¬ 
soning on the basis of interactive user interfaces in process of data mining, data 
preprocessing, knowledge representation, discovering dependencies, and decision¬ 
making. 

Let us further consider core peculiarities of visual analytics and the tasks it 
is designed to solve: m 

1. Visual analytics usually deals with complicated problems with big amounts 
of data requiring both human and machine resources. 

2. The final goal of visual analysis is to enable users to obtain deep insight 
in problems to be solved which might include processing of large amounts 
of data from various sources. For this purpose visual analytics combines 
both human and technological resources. On one hand, data mining and 
statistics are the driving force of any automatic data analysis. On the other 
hand, human brain’s aptitude for information perception and discovering 
dependencies in data complies to machine techniques and thus turns visual 
analytics into a promising sphere for further development. 

3. In its development, visual analytics fosters in its turn the development of data 
mining, data representation and visualization, and analytical reporting. 

4. Visual analytics also deals with human cognition, information perception. 
Computer Science, interactive and graphical design. 

5. Visual analytics combines methods of information visualization and graphi¬ 
cal data representation where visualization fosters human perception by the 
following means: 

(a) Enlarging data resources capacities makes user memorize less 


(b) Reducing search, such as by representing a large amount of data in small 
space 

(c) Enhancing recognition of patterns, such as when information is organized 
in space by its time relationships 

(d) Supporting easy relationship inference 

(e) Monitoring large amounts of potential events 

(f) Providing techniques for dynamic data monitoring 

2.2 Siemems 

Siemens uses visual analytics techniques in its product lifecycle management 
(PLM) software enabling developers to collect, process, visualize report data in 
the 3D engineering environment, and make real-time decisions in the process of 
developing new vehicles. 



Fig. 1. One of development stages with Siemens PLM Software 


The crucial point is that this system allows real-time visual interaction. This 
speeds up the processes of testing production for meeting given criteria, and 
eliminating product quality problems. 

2.3 Supernova modelling 

A highly powerful implementation of visual analytics paradigm was fulfilled 
by astrophysicists in Terascale Supernova Initiative (TSI) project. The goal of 

^ http://www.plm.automation.siemens.com 

^ science.energy.gov/~/media/ascr/ascac/pdf/meetings/mar03/Mezzacappa.pdf 



the project is to give scientists from various fields access to powerful computation 
resources in order to produce knowledge in the sphere of fundamental science. In 
particular, the question of supernova birth was studied which encompassed 3D 
turbulence, gravitation and magnetic field modelling. The scale of the investiga¬ 
tion was impressive - the modelling resulted in terabytes of data. The analysis 
of such amount of data lays beyond human power but combining human and 
machine capabilities allowed to make some inferences from all the bulk of data. 


3 Formal Concept Analysis and OA-biclustering 


3.1 Main definitions 


A formal context in FCA is a triple K = {G,M,I) where G is a set of 
objects, M is a set of attributes, and the binary relation I C G x M shows 
which object possesses which attribute, gim denotes that object g has attribute 
m. For subsets of objects and attributes A C G and B C M Galois operators 
are defined as follows: 

A! = {to € M I gIm \/g € A}, 

B' = {g € G I gim \/m S B}. 


A pair (A, B) such that A C G, B C M, A' = B and B' = A, is called a formal 
concept of a context K. The sets A and B are closed and called the extent and 
the intent of a formal concept (A, B) respectively. For the set of objects A the 
set of their common attributes A' describes the similarity of objects of the set 
A and the closed set A" is a cluster of similar objects (with the set of common 
attributes A'). 

The number of formal concepts of a context K = (G, M, I) can be quite large 
f^ 2 rmn{\G\,\M\} worst case), and the problem of computing this number 

is ^P-complete [T^. There exist some ways to reduce the number of formal 
concepts, for instance, choosing concepts by stability, index or extent size m- 
An alternative way is to make a relaxation of the definition of a formal 
concept. One of them is an OA-bicluster |3]. 

If (g, to) € I , then (to', g') is called an object-attribute bicluster with the density 


p{m',g') 


\I Him'xg')\ 


Bicluster density represents the percent of object-attribute pairs from the initial 
context in a certain bicluster. 

Here are the main properties of OA-biclusters: 

1. For any bicluster (A, B) C2^ x 2^^ it is true that 0 < p{A, B) < 1, 

2. An OA-bicluster {m',g') is a formal concept if p = 1, 

3. If {m',g') is a bicluster, then {g”,g') < {m!,m"). 


A bicluster (A, B) is called dense if its density is greater than a predefined 
minimum threshold, i.e. p((A, B)) > Pmin- The above mentioned properties show 
that OA-biclusters differ from formal concepts since unit density is not required. 
Below follows an illustrative example for triconcepts and triclusters. 



4 Triadic FCA and OAC-triclustering 


As a solution for three-way data in FCA, Triadic Concept Analysis (TCA) 
was introduced [5]. 

A triadic context K = {G,M,B,I) consists of sets G (objects), M (at¬ 
tributes), B (conditions), and ternary relation I C G x M x B. An incidence 
{g, m,b) G I shows that the object g has the attribute m under condition b. 

We denote a triadic context by {Xi, X 2 , X 3 ,1). A triadic context K = (Ai, A 2 , A 3 , 1) 
gives rise to the following dyadic contexts: 

A(1) = (Ai,A2XA3,/«), 

A(2) = (A2,A3XAi,/(2)), 

A(3) = (A3,AiXA2,/(3)), 

where gl^^'> (m, h) {g, h) {g, m) ^ {g, m, b) S I. 

The derivation operators (or prime operators) induced by are denoted by 
(.)^®). For each induced dyadic context we have two kinds of derivation operators. 

That is, for {i,j, k} = {1, 2, 3} with j < k and for Z C Xi and W Q Xj x X^ , 
the (i)-derivation operators are defined by: 

Z —>• = {(xj,Xk) € Xj X Afc I Xi,Xj,Xk are related by I for all Xi G Z}, 

w = {xi e Ai I Xi, Xj,Xk are related by I for all {xj,Xk) &W} 

A triadic concept of a triadic context K = (G, M, B, I) is a triple (Ai, A 2 , A 3 ) 
of Ai C Ai, A 2 C A 2 , A 3 C A 3 such that for every {i,j,k} = {1,2,3} with 
j < k we have A^*^ = {Aj x A^). 

Ai, A 2 and A 3 are called the extent, the intent and the modus of (Ai, A 2 , A 3 ). 

A set r = ((to, 6 )', {g, b)', {g, to)') for a triple {g, m,b) G I is called an OAC- 
tricluster (or object-attribute-condition tricluster or just tricluster) based on 
prime operators. Here 


{g,m)' = {h I ig,m,b) G I}, 
{g,by = (to I ig,m,b) G I}, 
{m,h)' = {g I {g,m,b) G /}. 


The density of a tricluster (A, B, C) of a triadic context K = {G, M, B, I) is 
given by the fraction of all triples of I in the tricluster, that is 
n(A R G) — FH^^xBxCI 

The tricluster T = (A, B, G) is called dense if its density is greater than a 
predefined minimum threshold, i.e. p{T) > Pmin- Just similarly to biclusters, 
triclusters have the following properties: 


1. For every triconcept {A,B,G) of a triadic context K = {G, M, B, I) with 
nonempty sets A, B and G we have p{A, B, G) = 1, 

2. For every tricluster (A, B, C) of a triadic context K = (G, M, B, I) with 
nonempty sets A, B and G we have 0 < p{A, B, G) < 1. 



4.1 Example 


Let us consider a sample context K = {U, I, S,Y), where U = {Ed, Leo, 
Max} is a set of users, I = (soccer, hockey} — their interests, S = {soccer.com, 
nhl.com, fifa.com, hockeycanada.ca} — sites they have added to bookmarks, 
Y C U X I X S is a, ternary relation between U, /, S which can be expressed by 
Table [T] 
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Table 1. Sample context. Designations: ui - Ed, U 2 - Leo, U 3 - Max, ii - soccer, 
12 - hockey, si - soccer.com, S 2 - nhl.com, S 3 - fifa.com, S 4 - hockeycanada.ca. 


Here, generally, we have |f/||/||S'| = 24 triples to analyze. But actually, this 
number is reduced to 11 , as there are lots of void triples present. 

Actually, users Ed, Leo and Max share the same interests and almost the 
same sites (all the difference is that Leo has not bookmarked hockeycanada.ca). 

The idea of clustering here is presented by a tricluster T = ({ui, U 2 , U 3 }, { 11 , 12 }, 

(si) S 2 , S 3 , S 4 }) with density p = 11/24 = 0.46. 

It is just one pattern to analyze instead of 11 in case of triples. 

5 Implemented algorithms 

The algorithms, described below, were implemented in Python 2.7.3 on a 2- 
processor machine (Core i3-370M, 2.4 HGz) with 3.87 GB RAM. One can find 
a description of testing procedure for these algorithms in m and m- 

5.1 OAC-prime triclustering algorithm 

The hard core of the algorithm is quite simple: for all incidences {g, m,b) G I 
for a triadic context K = {G, M, B, 1) we build a tricluster T = ((m, b)', {g, b)', {g, m)'). 
If a tricluster is unique and its density exceeds a predefined minimum thresh¬ 
old then it is added to an array of triclusters. A pseudocode of algorithm for 
OAC-triclustering based on prime operators is presented below. 

5.2 Recommender algorithm based on triclustering 

The recommender algorithm applied to sets of a tricontext is analogous to 
the one described in j^. It takes as an input a context of three sets (objects. 























Algorithm 1 OAC-triclustering based on prime operators 
Input: K — {G,M,B) - tricontext, 
pmin - density threshold 

Output: Tdic = {Xi, X2, X3} — a tricluster dictionary. Xi C G, A2 C M, X3 C 
B 

for {g, m,b) £ I do 

T = {{m,by, {g,bY, (p,m)') 

HashKey = hash{T) 

if HashKey ^ Tdic.keysQ and p{T) > pmin then 
Tdic[hashKey] = T 

end if 
end for 


Algorithm 2 Recommender algorithm 

Input: K = {U, T, R, I) - tricontext, Tr - a set of triclusters 

Output: Tagrec, ReSrec - sets of recommended tags and re¬ 

sources 


for u € U do 

for i = l,...,|Tr| do 

_•_ N 1 AR-ueiRTrA 


\ 1 / 1 -^^' l-aTr-,- I . I > 

simu[Tri) — 2 1 \R^uRTri I YKKtKX\> 

Trbest = argmax{sim„{Tri)) 

2^“5rec[*] = "^Trbest \ 

ReS7.ec[^] — ^R^best \ 

end for 


end for 




attributes, conditions), and the set of triclusters obtained as a resnlt of the OAC- 
prime triclnstering algorithm. For each nser among all triclusters the one most 
similar to triples with this nser is selected. The similarity of triclusters and triples 
is defined by function simu{Tri). The algorithm retnrns sets Tagrec, Re-Srec - tag 
and resonrce recommendations for all nsers. 

6 The challenge and visual tricluster analysis framework 

The challenge of the problem of triclustering (as of clustering on the whole) is 
to ontput meaningful, well-interpreted clnsters. Actnally, the term "meaningful" 
is not formally defined and is nsed by people to express their own subjective 
opinion on how well the task of clustering is solved, i.e. how similar the objects 
in same clusters are, how distant - in different ones, how it corresponds to real 
world problems etc. Therefore, here an expert opinion might be nsefnl, and 
a prototype of a visual analytics framework, described below, provides visual 
feedback for expert, and gives him ability to explore clnsters in details. 
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Fig. 2. Highlighting a largest triclnster for a nser-tag pair (it6,t4) 


In fignre we can see a map of triclnsters produced by algorithm for a 
context of 20 nsers, 20 tags, and 200 resources. The map is projected on the User- 
Tag plane. The more a certain user-tag pair is presented in triclusters the darker 
the corresponding sqnare. A nser-tag pair (u6,tl2), for instance, is included in 





























73 triclusters (a dark red square) while {u5,t9) - just in 1 (a red square), and 
no triclusters have a pair (u9,tl0) (a grey one). 
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Fig. 3. Recommended tags for several users 


All triclusters including a certain user-tag pair can be listed by clicking on the 
"Triclusters" menu label. Similarly, triconcepts can be listed. One can also high¬ 
light the biggest tricluster with a certain user-tag pair or output all triclusters of 
the initial context ordered by density. Moreover, through the "Recommend at¬ 
tributes" context menu option an analyst can depict the results of recommender 
algorithm for a certain user (in this case, to show recommended tags). 

The tool is intended to help an analyst to grasp the results of triclustering 
and recommender algorithms, and to make decisions on meaningfulness of certain 
triclusters and recommendations. The map helps the expert to quickly detect the 
concentrated regions (dark squares) and visualize dense triclusters including the 
corresponding triples. Further, it helps to make the decision whether the selected 
dense tricluster is meaningful or not, i.e. if it really combines similar users, tags, 
and resources. 

7 Further work 

There are several important issues to be regarded: 

1 . Limited human contribution: human contribution to triclustering in this vi¬ 
sual analytics approach is limited and might only reach some hundreds of de¬ 
cisions on certain triclusters (less plausible, a thousand). Therefore, machine 


















learning approach might help to learn the algorithm to classify meaningful 
clusters. The distance metric on triclusters should be carefully chosen. 

2. Scalability: the issue of scalability is quite challenging in the described tech¬ 
nique, and is to be solved. In current state, the application can support only 
contexts with one long dimension, for instance, a context of 20 users, 20 
tags, and 400000 resources which can be projected onto a user-tag plane. 
One possible way to address the scalability issue is to perform preliminary 
clustering of objects, attributes, and conditions separately, and then choose 
representatives from each cluster. 

3. Extending the idea of a human-machine approach to other problems in EGA 
or data mining, such as exploring implications and association rules in order 
to find meaningful ones. 

8 Conclusion 

Visual analytics, as one of the flourishing domains of data analysis, can be 
useful in mining objects with similar attributes under similar conditions in a 
context of social network data. A special algorithm was developed for uniting 
such objects, attributes, and conditions in triclusters. The program framework 
under development is intended to graphically display the results of this algorithm 
and to empower an analyst to decide on the meaningfulness of clusters and tags 
or resources recommendations for objects. 
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